28 research outputs found

    Tag-based user profiling: A game theoretic approach

    Get PDF

    Report from Dagstuhl Seminar 23031: Frontiers of Information Access Experimentation for Research and Education

    Full text link
    This report documents the program and the outcomes of Dagstuhl Seminar 23031 ``Frontiers of Information Access Experimentation for Research and Education'', which brought together 37 participants from 12 countries. The seminar addressed technology-enhanced information access (information retrieval, recommender systems, natural language processing) and specifically focused on developing more responsible experimental practices leading to more valid results, both for research as well as for scientific education. The seminar brought together experts from various sub-fields of information access, namely IR, RS, NLP, information science, and human-computer interaction to create a joint understanding of the problems and challenges presented by next generation information access systems, from both the research and the experimentation point of views, to discuss existing solutions and impediments, and to propose next steps to be pursued in the area in order to improve not also our research methods and findings but also the education of the new generation of researchers and developers. The seminar featured a series of long and short talks delivered by participants, who helped in setting a common ground and in letting emerge topics of interest to be explored as the main output of the seminar. This led to the definition of five groups which investigated challenges, opportunities, and next steps in the following areas: reality check, i.e. conducting real-world studies, human-machine-collaborative relevance judgment frameworks, overcoming methodological challenges in information retrieval and recommender systems through awareness and education, results-blind reviewing, and guidance for authors.Comment: Dagstuhl Seminar 23031, report

    BeaSku at CheckThat! 2021:Fine-tuning sentence BERT with triplet loss and limited data.

    Get PDF
    Misinformation and disinformation are growing problems online. The negative consequences of the proliferation of false claims became especially apparent during the COVID-19 pandemic. Thus, there is a need to detect and to track false claims. However, this is a slow and time-consuming process, especially when done manually. At the same time, the same claims, with some small variations, spread simultaneously across many accounts and even on different platforms. One promising approach is to develop systems for detecting new instances of claims that have been previously fact-checked online, as in the CLEF-2021 CheckThat! Lab Task-2b. Here we describe our system for this task. We fine-tuned sentence BERT using triplet loss, and we experimented with two types of augmented datasets. We further combined BM25 scores with language model similarity scores as features in a reranker. The official evaluation results have put our BeaSku system at the second place

    Dental care protocol based on visual supports for children with autism spectrum disorders

    Get PDF
    Background : Subjects with Autism Spectrum Disorders (ASDs) have often difficulties to accept dental treatments. The aim of this study is to propose a dental care protocol based on visual supports to facilitate children with ASDs to undergo to oral examination and treatments. Material and Methods : 83 children (age range 6-12 years) with a signed consent form were enrolled; intellectual level, verbal fluency and cooperation grade were evaluated. Children were introduced into a four stages path in or der to undergo: an oral examination (stage 1), a professional oral hygiene session (stage 2), sealants (stage 3), and, if necessary, a restorative treatment (stage 4). Each stage came after a visual training, performed by a psychologist (stage 1) and by parents at home (stages 2, 3 and 4). Association between acceptance rates at each stage and gender, intellectual level, verbal fluency and cooperation grade was tested with chi-square test if appropriate. Results: Seventy-seven (92.8%) subjects overcame both stage 1 and 2. Six (7.2%) refused stage 3 and among the 44 subjects who need restorative treatments, only three refused it. The acceptance rate at each stage was statistically significant associated to the verbal fluency ( p =0.02; p =0.04; p =0.01, respectively for stage 1, 3 and 4). In stage 2 all subjects accepted to move to the next stage. The verbal/intellectual/cooperation dummy variable was statistically associated to the acceptance rate ( p <0.01). Conclusions: The use of visual supports has shown to be able to facilitate children with ASDs to undergo dental treatments even in non-verbal children with a low intellectual level, underlining that behavioural approach should be used as the first strategy to treat patients with ASDs in dental setting

    Perspectives on Large Language Models for Relevance Judgment

    Full text link
    When asked, current large language models (LLMs) like ChatGPT claim that they can assist us with relevance judgments. Many researchers think this would not lead to credible IR research. In this perspective paper, we discuss possible ways for LLMs to assist human experts along with concerns and issues that arise. We devise a human-machine collaboration spectrum that allows categorizing different relevance judgment strategies, based on how much the human relies on the machine. For the extreme point of "fully automated assessment", we further include a pilot experiment on whether LLM-based relevance judgments correlate with judgments from trained human assessors. We conclude the paper by providing two opposing perspectives - for and against the use of LLMs for automatic relevance judgments - and a compromise perspective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR researchers. We hope to start a constructive discussion within the community to avoid a stale-mate during review, where work is dammed if is uses LLMs for evaluation and dammed if it doesn't

    Enabling Performance Prediction in Information Retrieval Evaluation

    No full text

    Modelling and Explaining IR System Performance Towards Predictive Evaluation

    No full text
    Information Retrieval (IR) systems play a fundamental role in many modern commodities, including Search Engines (SE), digital libraries, recommender systems and social networks. The IR task is particularly challenging because of the volatility of IR systems performance: users’ information needs change daily, and so do the documents to be retrieved and the concept of what is relevant to a given information need. Nevertheless, the empirical evaluation of an IR system is a costly and slow post-hoc procedure, that happens after the system deployment. Given the challenges linked to empirical IR evaluation, predicting a system’s performance before its deployment, would add significant value to the development of an IR system. In this manuscript, we place the cornerstone for the prediction of IR performance, by considering two closely related areas: the modeling of IR systems performance and the Query Performance Prediction (QPP). The former area allows us to identify those features that impact the most on the performance and that can be used as predictors, while the latter provides us with a starting point to instantiate the predictive task in IR. Concerning the modeling of IR performance, we first investigate one of the most popular statistical tools, ANOVA, by comparing the traditional ANOVA with a recent approach, bootstrap ANOVA. Secondly, using ANOVA, we study the concept of topic difficulty and observe that the topic difficulty is not an intrinsic property of the information need, but it stems from the formulation used to represent the topic. Finally, we show how to use Generalized Linear Models as an alternative to the traditional linear modeling of IR performance. We show how GLMs provide more powerful inference, with comparable stability. Our analyses on the QPP domain start with developing a predictor used to select among a set of reformulations for the same information need, the best performing one for the systematic review task. Secondly, we investigate how to classify queries as either semantic or lexical to predict whether neural models will perform better than lexical ones. Finally, given the challenges shown in the evaluation of the previous approaches, we devise a new evaluation procedure, dubbed sMARE. sMARE allows moving from single point estimation of the performance, to a distributional one, allowing to achieve improved comparisons between QPP models and more precise analyses.Information Retrieval (IR) systems play a fundamental role in many modern commodities, including Search Engines (SE), digital libraries, recommender systems and social networks. The IR task is particularly challenging because of the volatility of IR systems performance: users’ information needs change daily, and so do the documents to be retrieved and the concept of what is relevant to a given information need. Nevertheless, the empirical evaluation of an IR system is a costly and slow post-hoc procedure, that happens after the system deployment. Given the challenges linked to empirical IR evaluation, predicting a system’s performance before its deployment, would add significant value to the development of an IR system. In this manuscript, we place the cornerstone for the prediction of IR performance, by considering two closely related areas: the modeling of IR systems performance and the Query Performance Prediction (QPP). The former area allows us to identify those features that impact the most on the performance and that can be used as predictors, while the latter provides us with a starting point to instantiate the predictive task in IR. Concerning the modeling of IR performance, we first investigate one of the most popular statistical tools, ANOVA, by comparing the traditional ANOVA with a recent approach, bootstrap ANOVA. Secondly, using ANOVA, we study the concept of topic difficulty and observe that the topic difficulty is not an intrinsic property of the information need, but it stems from the formulation used to represent the topic. Finally, we show how to use Generalized Linear Models as an alternative to the traditional linear modeling of IR performance. We show how GLMs provide more powerful inference, with comparable stability. Our analyses on the QPP domain start with developing a predictor used to select among a set of reformulations for the same information need, the best performing one for the systematic review task. Secondly, we investigate how to classify queries as either semantic or lexical to predict whether neural models will perform better than lexical ones. Finally, given the challenges shown in the evaluation of the previous approaches, we devise a new evaluation procedure, dubbed sMARE. sMARE allows moving from single point estimation of the performance, to a distributional one, allowing to achieve improved comparisons between QPP models and more precise analyses
    corecore